Quantitative Games on Probabilistic Timed Automata 



Marta Kwiatkowska, Gethin Norman, and Ashutosh Trivedi 
Oxford University Computing Laboratory, Parks Road, Oxford, 0X1 3QD 



Abstract. Two-player zero-sum games are a well-established model for syn- 
thesising controllers that optimise some performance criterion. In such games 
one player represents the controller, while the other describes the (adversarial) 
environment, and controller synthesis corresponds to computing the optimal 
strategies of the controller for a given criterion. Asarin and Maler initiated the 
study of quantitative games on (non-probabilistic) timed automata by synthesis- 
ing controllers which optimise the time to reach a final state. The correctness 
and termination of their approach was dependent on exploiting the properties of a 
special class of functions, called simple functions, that can be finitely represented. 
In this paper we consider quantitative games over probabilistic timed automata. 
Since the concept of simple functions is not sufficient to solve games in this 
setting, we generalise simple functions to so-called quasi-simple functions. Then, 
using this class of functions, we demonstrate that the problem of solving games 
with either expected reachability-time or expected discounted-time criteria on 
probabilistic timed automata are in NEXPTIME n co-NEXPTIME. 

1 Introduction 

Two-player zero-sum games on finite automata, as a mechanism for supervisory 
controller synthesis of discrete event systems, were introduced by Ramadge and 
Wonham jj24). In this setting the two players — called Min and Max — represent 
the 'controller' and the 'environment' and control-program synthesis corresponds to 
finding a winning (or optimal) strategy of the 'controller' for some given performance 
objective. If the objectives are dependent on time, e.g. when the objective corresponds 
to completing a given set of tasks within some deadline, then games on timed automata 
are a well-established approach for controller synthesis, see for example [3. 1 8 6]. 

In this paper we extend this approach to systems which are quantitative in 
terms of time and probabilistic behaviour. Probabilistic information is important for 
modelling, e.g., faulty or unreliable components, the random coin flips of distributed 
communication and security protocols, and performance characteristics. We consider 
games on probabilistic timed automata 122I15I5L a modelling framework for real-time 
systems exhibiting both nondeterministic and probabilistic behaviour. We concentrate 
on expected reachability-time games, where the performance objective concerns the 
expected minimum time the controller can ensure for the system to reach a target, 
regardless of uncontrollable (environmental) events. This approach has many practical 
applications, including job-shop scheduling, where machines can be faulty or have 
variable execution times, and both routing and task graph scheduling problems, where 
both time and stochastic behaviour is also relevant. We also discuss discounted-time 
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games where, intuitively, at each transition the system breaks down with some non- 
zero probability, and the players try to optimise the expected time to breakdown. 

Contributions. Our approach is inspired by the work of Asarin and Maler [3] who 
initiated the study of quantitative games on (non-probabilistic) timed automata. Their 
results were dependent on exploiting the properties of a special class of functions, called 
simple functions, that can be finitely represented. Since the concept of simple functions 
is not sufficient to solve games in this setting, we generalise simple functions to so- 
called quasi-simple functions. Using this class of functions and the boundary region 
graph construction |20|, we demonstrate that the problem of solving games with either 
expected reachability-time or expected discounted-time criteria on probabilistic timed 
automata are in NEXPTIME n co-NEXPTIME. 

Related Work. Hoffman and Wong-Toi lfl4l were the first to define and solve optimal 
controller synthesis problem for timed automata. For a detailed introduction to the topic 
of qualitative games on timed automata, see e.g. J4|. Asarin and Maler initiated the 
study of quantitative games on timed automata by providing a symbolic algorithm to 
solve reachability-time games. The works of ifTOl and lfl8l showed that the decision 
version of the reachability-time game is EXPTIME-complete for timed automata with 
at least two clocks. For average-time objectives, Jurdziriski and Trivedi |[T9l showed the 
EXPTIME-completeness of the problem for timed automata with two or more clocks. 

A natural extension of reachability-time games for timed automata is reachability- 
price games for priced timed automata. Alur, Bernadsky, and Madhusudan [ 1 1 and 
Bouyer et al. [8 1 gave semi-algorithms to compute the value of reachability -price games 
on linearly-priced timed automata. In [11] and [7| it was shown that checking the 
existence of optimal strategies in a reachability-price game is undecidable for automata 
with three clocks and stopwatch prices. 

We are not aware of any previous work studying games on probabilistic timed 
automata. For a significantly different model of stochastic timed games, [9] show that 
deciding whether a target is reachable within a given probability bound is undecidable. 
Regarding one-player games on probabilistic timed automata, lfl6l shows that a number 
of one-player optimisation problems on concavely-priced probabilistic timed automata 
can be reduced to solving corresponding problems on the boundary region graph. We 
also mention 1211 . based on the digital clocks approach 0131 . which solves expected- 
time (and expected-cost) reachability for a subclass of probabilistic timed automata. 

2 Preliminaries 

We begin by presenting the background material required in the remainder of the paper. 
We assume, the sets N of non-negative integers, R of reals and of non-negative 
reals. For n G N, let [njim and Jn]j{ denote the sets {0, 1, ... , n), and {r G M | 0<r<n} 
respectively. For x=(x\, . . . , x n ) € K. n , we define || £tr || oo = max{|xi| 1 1 < i < n}. 

Probability distributions. A discrete probability distribution over a countable set Q 
is a function /i : Q— !•[(), 1] such that J2 q eQ m(<z)= 1- For a possible uncountable set 
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Q', we define V{Q') to be the set of functions \i : Q' — > [0, 1] such that the set 
supp(p)— {q e Q | p(q)>0} is countable and, over supp(p), fi is a distribution. We 
say that p e 2?(Q) is a point distribution if /x(g)=l for some q e Q. 

Markov decision processes. We next introduce Markov decision processes a mod- 
elling formalism for systems exhibiting nondeterministic and probabilistic behaviour. 

Definition 1. A Markov decision process (MDP) is a tuple M = (S, F, A,p, n) where: 

- S is the set of states including a set of final states F; 

- A is the set of actions; 

- p : S x A — »■ T)(S) is a partial function called the probabilistic transition function; 

- it : S x A — > R® is the reward function. 

We write A(s) for the set of actions available at s, i.e., the set of actions a for 
which p(s, a) is defined. In an MDP M, if the current state is s, then there is a 
non-deterministic choice between the actions in A(s) and if action a is chosen the 
probability of reaching the state s'eS equals p(s'\s, a) = p(s, a)(s'). 

Clocks, clock valuations, regions and zones. We fix a constant k E N and finite set 
of clocks C. A (fc-bounded) clock valuation is a function v : C — > and we write V 
for the set of clock valuations. 

Assumption 1. Although clocks are usually allowed to take arbitrary non-negative 
values, we have restricted their values to be bounded by the constant k. This restriction 
is for technical convenience and comes without significant loss of generality. 

If v e V and t <G M© then we write v+t for the clock valuation defined by (u+t)(c) = 
u(c)+t, for all c e C. For C C C, we write i/[C:=0] for the clock valuation where 
u[C:=0](c) = if c e C, and u[C:=0](c) = v{c) otherwise. For X C V, we write X 
for the smallest closed set in V containing X. Let X <Z V be a convex subset of clock 
valuations and let F : X — > R be a continuous function. We write F for the unique 
continuous function F' : X — > K, such that F'(^) = F(^) for all i/£l. 

The set of c/ocfe constraints over C is the set of conjunctions of simple constraints, 
which are constraints of the form c ix i or c— d dxi i, where c, c' e C, i e [A;]n> an d 
XI e {<, >, =, <, >}. For every v E V, let SCC(^) be the set of simple constraints 
which hold in v. A clock region is a maximal set ( C V, such that SCC(^)=SCC(^') 
for all v, v' e C. Every clock region is an equivalence class of the indistinguishability- 
by-clock-constraints relation, and vice versa. Note that v and v' are in the same clock 
region if and only if the integer parts of the clocks and the partial orders of the clocks, 
determined by their fractional parts, are the same in v and v' . We write \v\ for the clock 
region of v and, if C=H' write C[C ;= 0] for the clock region [i/[C:=0]]. 

A clock zone is a convex set of clock valuations, which is a union of a set of clock 
regions. We write Z for the set of clock zones. For any clock zone W and clock 
valuation v, we use the notation v E W to denote that [v] e W. A set of clock 
valuations is a clock zone if and only if it is definable by a clock constraint. Observe 
that, for every clock zone W, the set W is also a clock zone. 
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3 Stochastic Games on Probabilistic Timed Automata 

In this section we introduce stochastic games played on probabilistic timed automata. 

Probabilistic timed automata. Probabilistic timed automata are a modelling frame- 
work for real-time systems exhibiting both nondeterministic and probabilistic be- 
haviour. The formalism is derived by extending classical timed automata |2] with 
discrete probability distributions over edges. 

Definition 2 (PTA syntax). A probabilistic timed automaton (PTA) is a tuple T = 
(L, Lp,C, Inv, Act, E, S) where: 

- L is the finite set of locations including the set of final locations Lp; 

- C is the finite set of clocks; 

- Inv : L — > Z is the invariant condition; 

- Act is the finite set of actions; 

- E : LxAct —> Z is the action enabledness function; 

- 5 : (LxAct) — > T)(2 c xL) is the transition probability function. 

A timed automaton is a PTA with the property that 6(£, a) is a point distribution for 
all £ £ L and a £ Act. When we consider a PTA as an input of an algorithm, its 
size should be understood as the sum of the sizes of encodings of L, C, Inv, Act, 
E, and 5. As usual ifTTl . we assume that probabilities are expressed as ratios of two 
natural numbers, each written in binary. In addition, we assume the following standard 
restriction on PTAs which ensures time divergent behaviour. 

Assumption 2. We restrict attention to structurally non-Zeno PTAs 11261 1 71 . 

A configuration of a PTA T is a pair (£, v), where £ £ L is a location and v £ V 
is a clock valuation over C such that v £ Inv(£). For any t £ M, we let [£,v)+t 
equal the configuration {I, v+t). Informally, the behaviour of a PTA is as follows. In 
configuration (I, v) time passes before an available action is triggered, after which a 
discrete probabilistic transition occurs. Time passage is available only if the invariant 
condition Inv(£) is satisfied while time elapses, and an action a can be chosen after 
time t elapses only if it is enabled after time elapse, i.e., if v+t £ E(£, a). Both the 
time and the action chosen are nondeterministic. If the action a is chosen, then the 
probability of moving to the location £! and resetting all of the clocks in C to is given 
by8[£,a](C,£'). 

Formally, the semantics of a PTA is given by an MDP which has both an infinite 
number of states and an infinite number of transitions. 

Definition3 (PTA semantics). Let T = (L, Lp,C, Inv, Act, E,S) be a PTA. The 
semantics ofT is the MDP [T] = (S, F, A, p, n) where 

- S C LxV, the set of states, is such that {£, v) £ S if and only ifv£ Inv(£); 

- F = S n (Lp x V) is the set of final states; 

- A = R ffi x Act is the set of timed actions; 

- p : S x A — > T)(S) is the probabilistic transition function such that for (£, v) £ S 
and (t, a) £ A, we have p((£, v), (t, a)) — p, if and only if 
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• v+t' G Inv{l)forallt' G [0,t); 

• v+t G E(£,a); 

- 7r : A— >-K is f/ze reward function where 7r(s, (f, a))=tfor s £ S and (t, a) G A. 
In the rest of the paper, for the sake of notational convenience, we often write v) 

Probabilistic Timed Game Arena. We are now in a position to introduce probabilistic 
timed game arenas. 

Definition 4. A probabilistic timed game arena is a triplet T — (T, Lyim, -^Max) where 
T = [L, Lp,C, Inv, Act, E, S) is a PTA and (Lmir, -^Max) is a partition of L. 

The semantics of a probabilistic timed game arena T is the stochastic game arena [T] = 
(PI, S M m, Smzo) where [T] = (S, A, F,p, tt) is the semantics of T, and S Min — S fl 
(^Min x V) and S'msx = S\S\im- Intuitively 5mui is the set of states controlled by player 
Min, and Smsx is the set of states controlled by player Max. 

In a turn-based game on T players Min and Max move a token along the states of 
the PTA in the following manner. If the current state is s, then the player controlling the 
state chooses an action (t, a) G A(s) after which state s' G S is reached with probability 
p(s'\s, a). In the next turn the player controlling the state s' chooses an action in A(s') 
and a probabilistic transition is made accordingly. 

We say that (s, (t, a), s') is a transition in T if p(s'\s, (t, a))>0 and a play of T 
is a sequence (s , ai), si, • • •) G 5x (AxS 1 )* such that (sj, (ti+i, fli+i), Sj+i) is a 
transition for all i>0. We write Play (FPlay) for the sets of infinite (finite) plays and 
Play s (FPlay s ) for the sets of infinite (finite) plays starting from state s. For a finite play 
r let last(r) denote the last state of the play. Let Xi and Yj, denote the random variables 
corresponding to i th state and action of a play. 

A strategy of player Min in T is a partial function [i : FPlay — > T> (A), defined for 
r G FPlay if and only if last(r) G ^Min, such that supp(fi(r)) C A(last(r)). Strategies 
of player Max are defined analogously. We write Ey\; m and i7 Max for the set of strategies 
of players Min and Max, respectively. Let Play^' x denote the subset of Play s which 
corresponds to the set of plays in which players play according to fi G ^Mm and \ <= 
^Maxi respectively. A strategy a is pure if air) is a point distribution for all r G FPlay 
for which it is defined, while it is stationary if last(r)=last{r') implies a(r)=a(r') for 
all r, r' G FPlay. 

To analyse the behaviour of a stochastic game on T under a strategy pair (/i, 
for every state s of T, we define a probability space (Play^' x , F P i av ^,x , Prob^ x ) over 
the set of infinite plays under strategies /i and x with s as the initial state. Given a 
real-valued random variable f : Play —> H., we can then define the expectation of this 
variable E^' x {/} with respect to strategy pair (/i, \) when starting in s. 

For technical convenience we make the following standard |23| assumption (a 
similar assumption is required for optimal expected reachability price problem for finite 
MDPH2): 

Assumption 3. For every strategy pair /j, G ^Min, X G -Sivtax, and state s E S we have 
thatlim l ^ 00 Pro^^(X l G F) = 1. 
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Expected Reachability-Time Game. In an expected reachability-time game on T = 
(T, L M ; n , L Max ) player Min attempts to reach the final states as quickly as possible, 
while the objective of player Max is the opposite. More precisely, Min is interested 
in minimising her losses, while player Max is interested in maximising his winnings 
where, if player Min uses the strategy p e Emu an d player Max uses the strategy 
X € -^Max, player Min loses the following amount to player Max: 

EReach( S , M , X ) £ E?* ' X ^ F K(X^ U Yi )} . 

Observe that player Max can choose his actions to win at least an amount arbitrarily 
close to sup xg £. M x mf Aie ^; Min EReach(s, p,, x). This is called the lower value Valfs) of 
the expected reachability-time game starting at s: 

Val(s) = sup xe ^ Hiix inf^g^ EReach(s, p, X ) ■ 

Similarly, player Min can choose to lose at most an amount arbitrarily close to 
infjugXMin su Px6^M« EReach(s, p, This is called the upper value Val(s) of the game: 

VaT(s) = val^s^ swp xeSyim EReach(s, p, x) ■ 

It is straightforward to verify that Val(s) < Val(s) for all s E S. We say that the 
expected reachability-time game is determined if Valfs) = Val(s) for all s £ S. In 
this case we also say that the value of the game exists and denote it by Val(s) = 
Val(s) = Val(s) for all s G S. The results of this paper present a proof of the following 
proposition. 

Proposition 5. Expected reachability-time games are determined. 

For p, e U Min and x G ^Max we define Val M (s) = sup xe£ . EReach(s, p, x) an d 
Val x (s) = inf pe s Wm EReach(s, p,, For an e>0, we say that p e U Min or % G ^Max 
is e-optimal if Val A1 (s)<Val(s)+e or Val x (s)>Val(s)— e, respectively, for all s E S. 
If an expected reachability-time game is determined, then for every e>0, both players 
have e-optimal strategies. 

Optimality Equations. We now review optimality equations for characterising the 
value in an expected reachability-time game. Let T be a probabilistic timed game arena 
and let P : S — > K®. We say that P is a solution of optimality equations Opt(T), and 
we write P |= Opt(T) if, for all s e S: 

( ifsGF 

P(s)= \ w£(t,a)eA(s){t + E s > &s P( s '\ s ,& a ))- p ( s ')} tts£Smn\F 
{ snp MeA{s) {t + Z s >esP (s'k, (t, a)) ■ P(s')} if s G S Mix \F 

Under Assumption [3] the proof of the following proposition is routine and for details 
see, for example, fl8l . 

Proposition 6. If P (= Opt(T), then Val(s) = P{s)for all s G S and, for every e>0, 
both players have pure e-optimal strategies. 
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Using Proposition [6] it follows that the problem of solving an expected reachability- 
time game on T can be reduced to solving the optimality equations Opt(T). In the 
non-probabilistic setting, Jurdzihski and Trivedi ifTHIl showed that solving optimality 
equations for a reachability-time game on a (non-probabilistic) timed automaton T can 
be reduced to solving a reachability-price game on an abstraction, called the boundary 
region graph. Recently |[T6l . we extended this result reducing a number of one- 
player optimisation problems on probabilistic timed automata to solving corresponding 
problems on boundary region graphs. In the next section, we review boundary region 
graph abstraction for probabilistic timed automata and, in Section [6] we argue that 
boundary region graph abstraction for probabilistic timed automata is sufficient to solve 
expected reachability-time games. In Section [7] we explain that expected discounted- 
time games on probabilistic timed automata can also be reduced to solving discounted- 
price games on their boundary region graph. In SectionJS] we discuss some implications 
of these reductions on the complexity of the decision problems related to these games. 

4 The Boundary Region Graph Abstraction 

In this section we review the boundary region graph for PTAs introduced in fl6l . 

Regions. A region is a pair (£, Q, where £ is a location and £ is a clock region such that 
£ C Inv(£). For any s=(£, v), we write [s] for the region {£, [i/]) and TZ for the set of 
regions. A set Z C Lx V is a zone if, for every I £ L, there is a clock zone Wt (possibly 
empty), such that Z = {(£, v) \ I £ L A v £ W e }. For aregion R=(£, Q £ TZ, we write 
R for the zone { (£, v) \ v £ £}, recall £ is the smallest closed set in V containing £. 

For R, R' £ TZ, we say that R 1 is in the future of R, or that R is in the past of R 1 , 
if there is s £ R, s' £ Rf and t £ K e such that s' = s+t; we then write R — R' . 
We say that R' is the time successor of R if R — >> R', R^R ' , and R -»» R" Rf 
implies R"=R or R"=R' and write R R' and R' R. 

We say that a region R £ TZ is thin if [s] ^ [s+s] for every s £ R and e>0; other 
regions are called thick. We write 7?. Thin and 7\L xhick for the sets of thin and thick regions, 
respectively. Note that if R £ 7?. Xnick then, for every s £ R, there is an e > 0, such that 
[s] = [s+e]. Observe that the time successor of a thin region is thick, and vice versa. 

We say (£, u) £ LxV is in the closure of the region (£, £), and we write (£, v) £ 
[£,Q, if v £ C For an y v £ V , b £ [fc] N and c £ C such that u(c)<b, we let 
time{v, (b,c)) = b—u(c). Intuitively, time(v, (b, c)) returns the amount of time that 
must elapse in v before the clock c reaches the integer value b. Note that, for any 
(£, v) £ LxV and a £ Act, if t = time(v, (b,c)) is defined, then (£, [u+t]) £ 7?.Thin 
and supp{pj{- \ (£, v), (t, a))) C 7^ Thin . Observe that, for every R' £ 7^ Thin , there is a 
number b £ [fcjjj and a clock c £ C, such that, for every R £ TZ in the past of R' , we 
have s £ R implies (s+(b— s(c)) £ R'; and we write R —>b,c R' ■ 

The Boundary Region Graph. The boundary region graph is motivated by the 
following. Consider any a £ Act, s — (£, v) and R — (£, Q -t* R' — {£, (') such that 
s £ Rand R' £ E(£, a). 
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- If R' G 7?.Thick, then there are infinitely many t G such that s+t G R' . 
However, amongst all such t's, for one of the boundaries of the closer u+t is to 
this boundary, the 'better' the timed action (t, a) becomes for a player's objective. 
However, since R' is a thick region, the set {t £ R® | s+t G R'} is an open interval, 
and hence does not contain its boundary values. Observe that the infimum equals 
b-—u(c-) where R — >6_. c _ R- — >+i R 1 and the supremum equals b + — v(c + ) 
where R — >b + ,c + R+ R' ■ In the boundary region graph we include these 
'best' timed actions through the actions ((£>_, c_, a), R') and ((b + , c + , a),R'). 

- If R' G TvLxhiin then there exists a unique t G such that (£, v+t) G R'. Moreover 
since R 1 is a thin region, there exists a clock c G C and a number b G N such that 
R — >b. c R' an d t = b—v(c). In the boundary region graph we summarise this 'best' 
timed action from region R via region R' through the action ((&, c, a),R'). 

Based on this intuition the boundary region graph is defined as follows. 

Definition 7. Let T = (L, Lp,C, Inv, Act, E, 8) be a PTA. The boundary region graph 
o/T is defined as the MDP T = (S, F, A, p, tt) where 

-S= {((I, v), (£, 0) | (4 C) € RAv e C} andF = {((£, u), (i, Q)eS\l€ L F }, 

- the finite set of boundary actions A C (Ikl^xCxAct)xTZ and for R G 7Z we lev\ 
A{R) = {ae A((£, v),R) \ {{I, v), R) G S}; 

- for any state ((£,v),(£,C)) G S and action ((&, c, a), (4 Ca)) € A we have 
p((4 v), (4 0) ((&, c, a), (£, Ca))) = /« if and only if 

»{{t, V'\ (£', O) = EcCCA, £l[ C:=0]=,'A U C:=0]=C" 5 [^ «] (C 

/or a// ((£' ,i/),{P ,C,')) G 5 vv/zere z^ a = v+time(p, (b,c)) and one of the 
following conditions holds: 

• (4 C) ->6,c (4 Ca) an^ Ca G #(4 a) 

• (40 ->"&,c (4C-) {I, Q for some (£,(-) and Ca € £(4 a) 

• (40 "^6,c (4 C+) (4 Ca). for some (£, C+) and C^G E(£, a). 

- tt : S x A -> K k smcA f/jaf /or ((4^), (40) G ^ fln ^ ((b, c ,a),R) G 
1(((4 i/), (4 C))) ^ ^ve ??(((4 !/), (4 0), ((6, c, a), R)) = b - u(c). 

Although the boundary region graph is infinite, for a fixed initial state we can restrict 
attention to a finite state subgraph, thanks to the following observation EOl . 

Lemma 8. For any state of a boundary region graph, its reachable sub-graph is finite. 

5 Solving PTA Games on the Boundary Region Graph. 

We now show that the boundary region graph abstraction for PTAs is sufficient to solve 
the expected reachability-time games. The partition of the locations of a probabilistic 
timed game arena T = (T, L M i n , ^Max) gives rise to the partition (Smui, ^Max) °f tne 
set of states S of its boundary region graph and let T = (T, S M im Sm&x)- 

1 Notice that A(R) = A(s) for all s = ((£, v), R) € S. 
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We begin by reviewing the optimality equations for an expected reachability-time 
game on a boundary region graph T ■ Let P : S — > M®. We say that P is a solution of 
optimality equations Opt(T), and we write P \= Opt(T), if for any s G S: 

{0 if s G F 

max a eA{s)i t ( s > a ) +J2 S 'esP( s '\ s ' a ) ■ p ( s ')} ifs G Sm*x\F. 

Before trying to solve Opt(T) for a probabilistic timed game, let us consider the simpler 
case when T is a timed game. 

The non-probabilistic case. For a timed game T we define Succ : SxA — > S as 
follows: 

Succ(((£, u),R), ((&, c, a), (i, ())) = {{l 1 , {u+b-u(c))[C:=0]), (£', C[C:=0])), 

where (C, f ) G 2 C x L is such that d(£,a)(C,£') = 1. Now, using this function, the 
optimality equations Opt(T) can be rewritten as: 

{0 if s G F 

min aeA(g) {t(s, a) + P(Succ(s, a))} if s G SWAP 
max Qel(s) {t(s, a) + P(Succ(s, a))} if s G S Ma x\F. 

Based on these equations |3| introduced the following value iteration algorithm. 
Algorithm 9. Value iteration algorithm for (non-probabilistic) Opt(T). 

1. Set i :— 0, Po(s) :— if s G F and p a (s) := oo otherwise. 

3. If Pi+\ = Pi then return pi, else set i := i+1 and goto step 2. 
where * : [S -> R®] -> [5 -> R®] is such that for any / : 5 -> R® and seS: 

!0 if s G F 

^elf^^. ) + /(Succ( S ,a))} if sG Skn\P (1) 
maX aeA{ S ){*( S ' °) + /( SuCC ( S > «))} if S ^ ^Max\P- 

The proof of correctness of this algorithm is reliant on the concept of simple functions, 
and certain closure properties of these functions, which we now review. 

Definition 10 (Simple Functions). Let X C V. A function F : X — > R is simple 

either: there is e G Z, smc/i f/;af P(i^) = e/or every 1/ G X; or there are e G Z one/ 
c G C, smc/z f/zaf P(^) = e — u(c) for all v £ X. 

We say a function F : S — >• R® is regionally simple if for every region (£, () £ K 
the function P((£, •), (£, £)) is simple. Asarin and Maler Q showed that the following 
properties hold for simple functions. 
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Proposition 11 (Properties of simple functions). 

1. IfF:X^-Ris simple, then F : X — > R is simple. 

2. If F, F' : S —¥ K are regionally simple functions, then mm(F, F') and wbx(F, F') 
are also regionally simple^ 

3. If F be regionally simple, then, for every region R = (£, £) and a £ A{R), the 
function t(((£, ■), R), a) + F(Succ(((£, •), R),a)) is simple. 

4. Any decreasing sequence of regionally simple functions is finite. 

Using the first three closure properties of simple functions, it is easy to see that the 
function <I> in ([T]i is such that, if / is regionally simple, then so is Since the initial 

function po is regionally simple, it is immediate that, if Algorithm|9]terminates, it will 
return a regionally simple solution of Opt(T). Now, since the function <fr is monotonic, 
the sequence (po,pi,P2, ■ ■ ■) of intermediate value functions in Algorithm [9] is a 
decreasing sequence of regionally simple functions. Proposition |TTJ4) then guarantees 
the termination of the value iteration algorithm. Jurdziriski and Trivedi |[T8l show that if 
a solution of Opt(T) is regionally simple then it gives a solution of optimality equations 
for the original timed automaton. 

The probabilistic case. In this section we consider extending the above approach 
to solve Opt(T) when T is a probabilistic timed automaton. Based on the optimality 
equations, we define the value improvement function : [S — > M©] — > [S —> R®] 
such that for any / : S — > M© and s € S: 

{0 if s £ F 

m KeA(s) +E S ' eS i>(s'M-/(s')} if s e SmA-F ( 2 ) 

max Q £l( s ) {^( s '«) + 2~2 s >esP( s '\ s > a )-f( s ')} ifs e S M -,x\F. 

It is straightforward to verify that a fbcpoint of the function \I> is the solution of 
Opt(T). By Assumption [3] and Lemma |8j it is immediate that \& is a contraction, and 
therefore St can be used in a straightforward value iteration algorithm to approximate 
Opt(T). However, trying to extend the approach of Asarin and Maler [3| fails since 
the intermediate functions in the value iteration algorithms no longer remain regionally 
simple. To overcome this problem, we present a generalisation of simple functions, 
which we call quasi-simple functions. 

Before introducing quasi-simple functions, we require the partial order < C VxV, 
where for any valuations v and v' we have v<v' if and only if there exists ate Ml© such 
that for each clock c € C either v'(c)—v(c) = t or i/(c)=i/(c), and i/(c)—v(c) = t 
for at least one clock c 6 C. In this case we also write (2/— v) = t. 

Definition 12 (Quasi-Simple Functions). Let X C V be a subset of valuations. A 
function F : X ->I/s quasi-simple if: 

1 For functions F, F' : S -> R we define functions max(F, F'), min(F, F') : S -s> K by 
max(F,F')(s) = max{ F(s),F'(s) } and min(F, F')(s) = min{ F(s),F'(s) }, for every 

s e S. 



Quantitative Games on Probabilistic Timed Automata 



11 



- (Lipschitz Continuous) there exists K>0 such that \F(v)—F(v') \ < K ■ W^—iy'Woo 
for all v, v 1 6 X; 

- (Mono tonic ally decreasing and nonexpansive w.r.t. <) v < v' implies F{y)>F{y') 
and F{v)-F{v') < 1/ ' -v for all u,i/ € X. 

Proposition 13 (Quasi-simple functions generalise simple functions). Every simple 
function is also quasi-simple. 

Proof. Let X C V be a subset of valuations and F : X — >• M a simple function. If 
F is constant then the proposition trivially follows. Otherwise, there exists b E Z and 
c e C such that F(v) = b—v(c) for all v 6 X. We need to show that F is Lipschitz 
continuous, and monotonically decreasing and nonexpansive w.r.t <. 

1. To prove that F is Lipschitz continuous, notice that \F(y) — F(v') \ = \b — v(c) — 
b + u'{c)\ = W(c)-y{c)\ < IW-i/lU 

2. For v, v' e X such that v < i/', we have F(v) = &-j/(c) > b-v'(c) = F(v'). 
From the first part of this proof, it trivially follows that F(y)—F(y') < v—v 1 . □ 

We say a function F : S — > is regionally quasi-simple if for every region (£, () £ 1Z 
the function F((£, •), (£, 0) is quasi-simple. 

Lemma 14 (Properties of Quasi-Simple Functions). 

1. If F : X — >• R is quasi-simple, then F : X — > M. is quasi-simple. 

2. If F, F' : S — > K are regionally quasi-simple functions, then max(F, F') and 
rahi(F, F') are also regionally quasi-simple. 

3. If F is regionally quasi-simple, then, for any R — (£, £) and a € A(R), the function 
t(((£, •), R),a) + 2~2 S 'es p( s '\((^ ')■> R), a ) ' ^X s ') quasi-simple. 

4. The limit of a sequence of quasi-simple functions is quasi-simple. 

From the first three properties, it follows that the intermediate functions of in |2| are 
regionally quasi-simple. In addition, the fourth property implies that its fixpoint is also 
regionally quasi-simple, and hence the solution of Opt(T) is regionally quasi-simple. 

Proposition 15. Let T be a probabilistic timed game. If P \= Opt(T), then P is 
regionally quasi-simple. 

6 Correctness of the Reduction 

We now demonstrate the correctness of our results, showing that the problem of 
expected reachability-time games on PTAs can be reduced to expected reachability- 
price games over the boundary region graph. For a given function / : S — > K, we 
define / : S — > K by f{t,v) = f((£,v), (£, [v])). Formally we have the following 
result. 

Theorem 16. Let T bea probabilistic timed game. IfP \= Opt(T), then P \= Opt(T). 
Before giving the proof we require the following property of quasi-simple functions. 
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Lemma 17. Let s = (£, v) G S and (£, C) G 11 such that {£, [u]) (£, (). If F : S -> 
K is regionally quasi-simple, then the function F®^ a : I — > K defined as 

*%,.(*) = 1 + T. (c ,ne2c,A^ a](C, £')-F((£', S c ), (£', C c )) 

is continuous and nondecreasing, where I = {t G | v+t G C}> = ^+^[C := 0] 
an£/C c =C[C:=0]. 

Proof (of Theorem 16 ). Suppose that P |= Opt(T), to prove this theorem it is sufficient 
to show that for any s=(£, v) G ^Min we have: 

P(s) = inf ( t,.) 6 A(.) {* + Ecc/O^xi^^^^O^^.^+OlC-O])} ( 3 ) 

and for any s—(£, v) G SWax we have: 

P( S ) = su P(i)0)eA(s) {< + E(cr) e 2 C xL^M(C Y /)^(^X^)[C:=0])} . (4) 

In the remainder of the proof we restrict attention to Min states as the case for Max states 
follows similarly. Therefore we fix s=(£, v) G Smhi f° r me remainder of the proof. For 
a G Act, let TZj hin and 1Zj hick denote the set of thin and thick regions respectively that 
are successors of [v] and are subsets of E{£. a). Considering the right hand side (RHS) 
of ([3]) we have: 

RHS of Q = min {Tm^s, a), T Th ick(s, a)}, (5) 

aeAct 

where Txhm(s, a) (TThick(s, a)) is the infimum (supremum) of the RHS of ^ over all 
actions (t, a) such that [v+t] £ 72.j hin £ T^Thkk)- F° r me fi rst term we nave: 

T™ a (a,o)= min inf J t + E <$M(C/)-^Vc) I 

= min inf \t+ £ ^,a](C/)-P((/,^),(/,C C )) 1 
(«,C)eK Thin jeKA (c,<")e2 c xL J 



mm 



+ E 6[£A{C/)-P((£'Sc X) ),(£\C c )) 



where Vq denotes the clock valuation (i/+t)[C:=0\, t^'® the time to reach the region 
R from s and £ c the region £[C:=0]. Considering the second term of ^ we have 

T Thick (s,a)= min inf J t + E S[£, a] (C, £')■?(£' \ 

= min inf jt+ E *fca](C/)-P((/,i£)^,C C )) } 
-min inf Jt+ E S[e,a](C,£')-P((e'y c ),(£',<^)) \ 

{i,Oem^ icl t R ^<t<t R+ I (C,£')62C X L I 
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From Proposition fT3]it follows that P is regionally quasi-simple and, from Lemma [TT] 
the function: 

is continuous and nondecreasing over {t | v+t g £}. Therefore it follows that 

r T hick(s,a)= min min J t + £ S[t,a](CjyP((t,v t c ),(£',C c )) \ 
(^,C)e^? hlck t=t R _,t R+ y (c,e')e2 c xL J 

Substituting the values of T X hi n (s, a) and T T hj c k(s, a) into ^ and observing that, for 
any thin region (£, £) £ T^jhin' there exist b £ Z and ceC such that f (c)) e £, 

it follows from Definition|7]that RHS of ([3]) equals: 

min « e A(,,[ s] )| < (( s 'W),«)+ E fl(* / ,iO|(s,M) J a).P(a , ,iO 
I (s',R')es 

which by definition equals P(s) as required. □ 



7 Expected Discounted-Time Games 

Let T = (T, Z/Min, Ziviax) be a probabilistic timed game arena and A 6 [0, 1) be a 
discount factor. In an expected discounted-time game starting from state s and for a 
strategy pair /i, %, player Min loses the following amount to player Max: 

EDisct( SjMlX ) ^E^{E^ 1 A l -7r(X l _ 1 ,y i )} ■ 

The concepts for the expected discounted-time game are defined in an analogous 
manner to that of expected reachability-time games. A reduction from expected 
discounted-time game to expected reachability-time game is standard ||25ll . Therefore, 
using the techniques presented in this paper one can reduce the problem of solving 
expected discounted-time games on T to solving corresponding problem on T . 
Similarly a (non-probabilistic) discounted-time game on timed automata can be reduced 
to solving discounted-price games on (non-probabilistic) boundary region graph. 

Proposition 18 (Discounted-Time Games). 

1. Expected discounted-time games on probabilistic timed automata can be reduced 
to expected discounted-price games on the corresponding boundary region graph. 

2. Discounted-time games on timed automata can be reduced to discounted-price 
games on the corresponding (non-probabilistic) boundary region graph. 
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8 Complexity 

Theorem 19. The expected reachability-time games and the expected discounted-time 
games are EXPTIME-hard and they are in NEXPTIME n co-NEXPTIME. 

Proof. The EXPTIME-hardness of expected reachability-time games and expected 
discounted-time games on probabilistic timed automata with two or more clocks fol- 
lows from the fact that corresponding one-player games are EXPTIME-complete lfl6l . 
The membership in NEXPTIME n co-NEXPTIME follows from the reduction to the 
boundary region graph, and the observations that: size of the boundary region graph is 
exponential in the size of the PTA; and the complexity of solving expected reachability- 
price games and expected discounted-price games on finite MDP is in NP n co-NP. □ 

9 Conclusion 

In this paper we have employed the boundary region graph to solve quantitative games 
over probabilistic timed automata. The approach is based on extending the class of 
simple functions introduced in O to quasi-simple functions. Our results demonstrate 
that the problem of solving games with either expected reachability-time or expected 
discounted-time criteria on PTA are in NEXPTIME n co-NEXPTIME. Future work 
includes finding practical symbolic zone-based algorithms to solve quantitative games 
on timed automata and, perhaps more ambitiously, games on PTA. Regarding the other 
quantitative games on PTA, we conjecture that it is possible to reduce expected average- 
time games on PTA to mean payoff games on boundary region graph. However, the 
techniques presented in this paper are insufficient to demonstrate such a reduction. 
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A Proof of Proposition [6] 



Proof (of Proposition^. We show that for every e>0, there exists a pure strategy p £ : 
FPlay — > A for player Min, such that for every strategy x f° r player Max, we have 
EReach(s, fi e ,X) < P(s)+e. The proof, that for every e>0, there exists a pure strategy 
Xe '■ FPlay — > A for player Max, such that for every strategy /i for player Min, we 
have EReach(s, p, Xe)) > P( s ) ~ £ > follows similarly. Together, these facts imply that 
P is equal to the value function of the expected reachability-time game, and the pure 
strategies p e and x e , defined in the proof below for all e>0, are e-optimal. 

Let us fix e>0 and /i e be a pure strategy where for any n E N and finite play 
r E FPlay of length n, /i e (r) = (i. a) is such that 



* + Es>£ S P( s '\ last (r), (*, a)) ■ P{sf) < P{last(r))- 



'2" + ! 



Observe that for every state s E Swa an d f° r every e' > 0, there is a e'-optimal timed 
action because P |= Opt(T). 

Again using the fact that P \= Opt(T), it follows that, that for any s E Smm \ F 
and (t, a) E A, we have 

P(s) >t+Es'esP( s '\ s > a )- p ( s ')- (6) 

Now for an arbitrary strategy x f° r player Max, it follows by induction that for any 

n > 1: 

Using Assumption 3 we have lim„_ s . 00 J2s'es\F Pro ^s s ' x (X n =s') = 0, and therefore 
taking the limit in ( 7 1 we get the inequality: 

P(s) > E»<x {E^ {i|X<eF} ^-i,^)} - £ = EReach( s , Me;X ) - e. 
which completes the proof. □ 



B Proof of Lemma [14] 



The proof of Lemma 14 follows from Propositions 20p3 below. Note that since 



every quasi-simple function F : X — > R is Lipschitz continuous, and hence Cauchy 
continuous, it can be uniquely extended to closure of its domain X. The properties of 
quasi-simple function are trivially met by such extensions. 

Proposition 20. If F : X — > K is quasi-simple, then F : X — > K is quasi-simple. 

Proposition 21. If F,F' : S — > K are regionally quasi-simple functions, then 
max(i ? , F') and min(i ? , F') are also regionally quasi-simple. 
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Proof. To prove this proposition, it is sufficient to show that pointwise minimum and 
maximum of quasi-simple functions are quasi-simple. Let /, /' : X C V — > R be 
quasi-simple. We need to show that max(/, /') and min(/, /') are quasi-simple. 

Notice that max(/, /') and min(/, /') are Lipschitz continuous, as pointwise 
minimum and maximum of a finite set of Lipschitz continuous functions is Lipschitz 
continuous. 

It therefore remains to show that max(/, /') and min(/, /') are monotonically 
decreasing and nonexpansive w.r.t <. Consider any v, u' £ X such that V\ < v 2 . Since 
/ and /' are quasi-simple, by definition / and /' are monotonically decreasing, and 
hence f{v{) > f(v 2 ) and f'{v\) > f'fa)- Now since 

max(/,/')(^i) = max.{f{y 1 ),f'(i/ 1 )} > max {f{v 2 ), f'{v 2 j) = max(/,/')(i/ 2 ), 

it follows that max(/, /') is monotonically decreasing w.r.t <. In an analogous manner 
we show that min(/, /') is monotonically decreasing w.r.t <. 

Again since / and /' are quasi-simple, we have that they are nonexpansive, 
i.e., f{vi)-f{v 2 ) < v 2 -v 1 and f'{v-i)-f'(y 2 ) < v 2 -v x . To show max(/, /') is 
nonexpansive, there are the following four cases to consider. 

1. If /Oi) > f'(vi) and/(^ 2 ) > .f(^),thenmax(/,.f)(^i)-max(/,/')(^2) = 

f{l>\)-f{V2) < V 2 -V\. 

2. If f'{vi) > ffa) and/> 2 ) > /(z/ 2 ),thenmax(/,/')^i)-max(/,/')(^) = 
j>i)-.fM < V2-V1- 

3. If f(yi) > and/> 2 ) > /(zy 2 ),thenmax(/,/')(^i)-max(/,/')(^) = 

f(vi)-f{V2) < f{yi)~f{V2) < V2-V1. 

4. If f{vi) > f(vi) and/fa) > /> 2 ), then max(/, f')(ui)- max(/, f){v 2 ) = 

Since these are all the possible cases to consider, max(/, /') is nonexpansive w.r.t <. 
Similarly show min(/, /') is nonexpansive completing the proof. □ 

Proposition 22. If F is regionally quasi- simple, then for any R = an d a £ A{ R) 
the function t(((£, •), R), a) + ^ s '£sP( s 'l((^> ")) R), a ) ' F( s ') is quasi-simple. 

Proof. Let F be regionally quasi-simple and fix a region R = (£, () and a boundary 
action a = ((a, b, c), (£, (')) £ A(R) . We need to show that the function 

on the domain D = {v £ V\((£,u),R) £ 5} is quasi-simple. Let us first simplify the 
function F® R . For any v £ D we have: 

s'GS 

= (6-Kc))+E^ s 'K^^' i? )' a )- F ( s ') 

s'GS 

= (b-u(c)) + E { c,e')e2c X L^M(C/yF((e',u atC ), (£', C'[C:=0])) 

= +E( C i 0£2 exL^M(C/)'f(sf,, AC ) 
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where v afi = (i/+(6-i/(c)))[C:=0]) and s v ^ a>c = ({£' ,v a ,c), (£', C[C:=0])). 

Next using this simplified version we demonstrate that F® R is Lipschitz continuous. 

If F is Lipschitz continuous with constant K, then \F® R (u)— F® R {v')\ equals 
\v'{c)-u{c)\+ S[£,a}{C,e'y\F(s e/ y tai c)-F( Si ,^ a ,c)\ 

{CJ')£2 c xL 

< \v'{c)-v{c)\ + S[£,a]{C,tyK ■ ||^-^|U 

(C£')£2 c xL 

= \i/{c)-v{c)\ + K • <{1 + K)- ||l/-l/||oo. 

The first inequality follows from the fact that F is Lipschitz constant with constant K. 
Hence it follows that F® R is Lipschitz constant with constant (1 + K). 

It therefore remains to show that F® R is monotonically decreasing and nonexpan- 
sive w.r.t <. Consider any v, v' 6 V such that v < v' and v' — v = d. We have the 
following two cases to consider. 

- ffi/(c) = z/(c), then for any set (C,£') e 2 c xL we have that (v+b-u(c))[C:=0]< 
{v+b—v'{c))[C:— 0], and hence F(s£'^, Ql c) — F{st y : a,c) is nonnegative for all 
(C, £') € 2 C x L. Moreover, since F is nonexpansive, we have that F{sy , u ,a,c) — 
F(se> y ,a,c) < d. It follows that F® R is monotonically decreasing and non- 
expansive as 

and v' — v = d. 

- If v'(c)-v(c) = d, then for any (C, £') C 2 C x L we have that 

(v'+b-v'{c))[C:= 0] < (i/+6-i/(c))[C:=0] 

which implies that F(sg' iVlClt c) — F(se> .v' ,a,c) is nonpositive for all (C, (!) e 2 C x 
L. Moreover since F is nonexpansive, we have that F(s£/ ,u, a ,c)—F(str y,a.,c) < 
d. Similarly to the case above we have that F® R is monotonically decreasing and 
nonexpansive. 

The proof is now complete. □ 

The following proposition is immediate as the limit of Lipschitz continuous functions 
is Lipschitz continuous, and the limit of monotonically decreasing and nonexpansive 
functions is monotonically decreasing and nonexpansive. 

Proposition 23. The limit of a sequence of quasi-simple functions is quasi-simple. 

C Proof of Lemma [17] 

Proof (ofLemma\l7\. Let s = {l,v) E S and {£, () £ K be such that (£, [v]) {£, Q 
and let F : S — » K be regionally quasi-simple. We wish to show that the function 
F® Ca : I -> K defined as 

^ 1 aW = t + E( W xA«](C/)-i i '((^^) 1 (<'.C C )) 
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is continuous and nondecreasing, where I = {t € | u+t e (}, v l c — v + t[C:=0] 
and C c = C[C:=0]. . 

Let t\ , t 2 <G / are such that ti < t 2 - To prove this proposition we need to show that 
F ®C,a( < 2)-^® Ca (*i) is nonnegative. Now by definition we have F® { >0 (*2)--F^ c>0 (*i) 
equals: 

t 2 -h+ E a](C, £')-{F((l', v^), (£', C c ))-F{{^ i/*}), {I 1 , C, c ))) 

(ai')£2CxL 

= t 2 -ti- E «](c, C c ))- 4 2 ), C c ))) 

(C£')e2 c xL 

>t 2 -ti- E s[e,a}(c/y(h-h) 

(C,£')e2 c xL 

> 

where the inequality is due to the fact the F is monotonically decreasing and 
nonexpansive. □ 



